10 research outputs found

    Analysis and simulation of data prefetching algorithms for last-level cache memory

    Get PDF
    Analysis and simulation of data prefetching algorithms for last-level cache memory. Analysis and comparison of one of the latest data prefetching algorithms in terms of performance, network utilization and prefetching accuracy

    Diseño y evaluación de las memorias cache para un chip multicore alimentado a muy baja tensión.

    Get PDF
    El ahorro energético es un objetivo de primer orden en la investigación para el desarrollo de nuevos procesadores. La disminución de la tensión de alimentación consigue este objetivo pero, llegado a un límite se producen errores en las celdas de bit. El primer componente que falla es la cache de último nivel (LLC). Existen numerosas propuestas para mitigar la pérdida de prestaciones consecuencia de estos errores. Estas propuestas van desde la construcción de las celdas de forma robusta para evitar que se produzcan los fallos hasta complejas soluciones a nivel de arquitectura para mitigar el efecto que tienen estos fallos en el rendimiento final del procesador. Existen diversas soluciones arquitecturales: simples como block disabling que deshabilita los recursos defectuosos y más complejas como bdot que requieren modificar el protocolo de coherencia. Además, todas estas técnicas pueden verse optimizadas incluyendo políticas de reemplazo y modificaciones en la estructura de las memorias caches. En este trabajo se realizará un estudio sobre técnicas para mejorar el funcionamiento de la LLC a bajo voltaje y, posteriormente, se analizará una nueva propuesta de investigación consistente en utilizar una organización desacoplada para los almacenes de etiquetas y datos. Para ello, se utilizará un simulador detallado a nivel de ciclo basado en el entorno Gem5 de la Universidad de Michigan. Se modelará la nueva propuesta y otras del estado del arte sobre un sistema multiprocesador en chip con una jerarquía de memoria formada por caches privadas y una LLC compartida entre los procesadores. Además, se crearán cargas de trabajo monoprocesador y multiprocesador basadas en SPEC-2k6 y se usarán para realizar la comparación entre las propuestas en términos de tasa de fallos e instrucciones ejecutadas por unidad de tiempo

    STT-RAM memory hierarchy designs aimed to performance, reliability and energy consumption

    Get PDF
    Current applications demand larger on-chip memory capacity since off-chip memory accesses be-come a bottleneck. However, if we want to achieve this by scaling down the transistor size of SRAM-based Last-Level Caches (LLCs) it may become prohibitive in terms of cost, area and en-ergy. Therefore, other technologies such as STT-RAM are becoming real alternatives to build the LLC in multicore systems. Although STT-RAM bitcells feature high density and low static power, they suffer from other trade-offs. On the one hand, STT-RAM writes are more expensive than STT-RAM reads and SRAM writes. In order to address this asymmetry, we will propose microarchitectural techniques to minimize the number of write operations on STT-RAM cells. On the other hand, reliability also plays an important role. STT-RAM cells suffer from three types of errors: write, read disturbance, and retention errors. Regarding this, we will suggest tech-niques to manage redundant information allowing error detection and information recovery.Postprint (published version

    Compression-aware and performance-efficient insertion policies for long-lasting hybrid LLCs

    Get PDF
    Emerging non-volatile memory (NVM) technologies can potentially replace large SRAM memories such as the last-level cache (LLC). However, despite recent advances, NVMs suffer from higher write latency and limited write endurance. Recently, NVM-SRAM hybrid LLCs are proposed to combine the best of both worlds. Several policies have been proposed to improve the performance and lifetime of hybrid LLCs by intelligently steering the incoming LLC blocks into either the SRAM or NVM part, regarding the cache behavior of the LLC blocks and the SRAM/NVM device properties. However, these policies neither consider compressing the contents of the cache block nor using partially worn-out NVM cache blocks.This paper proposes new insertion policies for byte-level fault-tolerant hybrid LLCs that collaboratively optimize for lifetime and performance. Specifically, we leverage data compression to utilize partially defective NVM cache entries, thereby improving the LLC hit rate. The key to our approach is to guide the insertion policy by both the reuse properties of the block and the size resulting from its compression. A block is inserted in NVM only if it is a read-reuse block or its compressed size is lower than a threshold. It will be inserted in SRAM if the block is a write-reuse or its compressed size is greater than the threshold. We use set-dueling to tune the compression threshold at runtime. This compression threshold provides a knob to control the NVM write rate and, together with a rule-based mechanism, allows balancing performance and lifetime.Overall, our evaluation shows that, with affordable hardware overheads, the proposed schemes can nearly reach the performance of an SRAM cache with the same associativity while improving lifetime by 17× compared to a hybrid NVM-unaware LLC. Our proposed scheme outperforms the state-of-the-art insertion policies by 9% while achieving a comparative lifetime. The rule-based mechanism shows that by compromising, for instance, 1.1% and 1.9% performance, the NVM lifetime can be further increased by 28% and 44%, respectively.This work was partially funded by the HiPEAC collaboration grant 2020, the Center for Advancing Electronics Dresden (cfaed), the German Research Council (DFG) through the HetCIM project (502388442) under the Priority Program on ‘Disruptive Memory Technologies’ (SPP 2377), and from grants (1) PID2019-105660RB-C21 and PID2019-107255GB- C22/AEI/10.13039/501100011033 from Agencia Estatal de Investigación (AEI), and (2) gaZ: T5820R research group from Dept. of Science, University and Knowledge Society, Government of Aragon.Peer ReviewedPostprint (author's final draft

    Leveraging data compression for performance-efficient and long-lasting NVM-based last-level cache

    Get PDF
    Non-volatile memory (NVM) technologies are interesting alternatives for building on-chip Last-Level Caches (LLCs). Their advantages, compared to SRAM memory, are higher density and lower static power, but each write operation slightly wears out the bitcell, to the point of losing its storage capacity. In this context, this paper summarizes three contributions to the state-of-the-art NVM-based LLCs. Data compression reduces the size of the blocks and, together with wear-leveling mechanisms, can defer the wear out NVMs. Moreover, as capacity is reduced by write wear, data compression enables degraded cache frames to allocate blocks whose compressed size is adequate. Our first contribution is a microarchitecture design that leverages data compression and an intra-frame wear-leveling to gracefully deal with NVM-LLCs capacity degradation. The second contribution leverages this microarchitecture design to propose new insertion policies for hybrid LLCs using Set Dueling and taking into account the compression capabilities of the blocks. From a methodological point of view, although different approaches are used in the literature to analyze the degradation of a NV-LLC, none of them allows to study in detail its temporal evolution. In this sense, the third contribution is a forecasting procedure that combines detailed simulation and prediction, enabling an accurate analysis of different cache content mechanisms (replacement, wear leveling, compression, etc.) on the temporal evolution of the performance of multiprocessor systems employing such NVM-LLCs. Using this forecasting procedure we show that the proposed NVM-LLCs organizations and the insertion policies for hybrid LLCs significantly outperform the state-of-the-art in both performance and lifetime metrics.Peer ReviewedPostprint (published version

    Analysis and simulation of data prefetching algorithms for last-level cache memory

    No full text
    Analysis and simulation of data prefetching algorithms for last-level cache memory. Analysis and comparison of one of the latest data prefetching algorithms in terms of performance, network utilization and prefetching accuracy

    Analysis and simulation of data prefetching algorithms for last-level cache memory

    No full text
    Analysis and simulation of data prefetching algorithms for last-level cache memory. Analysis and comparison of one of the latest data prefetching algorithms in terms of performance, network utilization and prefetching accuracy

    STT-RAM memory hierarchy designs aimed to performance, reliability and energy consumption

    No full text
    Current applications demand larger on-chip memory capacity since off-chip memory accesses be-come a bottleneck. However, if we want to achieve this by scaling down the transistor size of SRAM-based Last-Level Caches (LLCs) it may become prohibitive in terms of cost, area and en-ergy. Therefore, other technologies such as STT-RAM are becoming real alternatives to build the LLC in multicore systems. Although STT-RAM bitcells feature high density and low static power, they suffer from other trade-offs. On the one hand, STT-RAM writes are more expensive than STT-RAM reads and SRAM writes. In order to address this asymmetry, we will propose microarchitectural techniques to minimize the number of write operations on STT-RAM cells. On the other hand, reliability also plays an important role. STT-RAM cells suffer from three types of errors: write, read disturbance, and retention errors. Regarding this, we will suggest tech-niques to manage redundant information allowing error detection and information recovery

    Pronóstico de capacidad efectiva y prestaciones en una cache no volátil de último nivel

    No full text
    La degradación debida a las escrituras que sufren las bitcells implementadas con tecnologi´as de memoria no volátil (NVM) es uno de los principales escollos que se presentan a la hora de construir la cache de último nivel (LLC) con estas tecnologi´as. Aunque en la literatura se recogen diferentes propuestas para hacer frente a esta degradación, la metodologi´a usada en los trabajos previos no permite estudiar en detalle la evolución de la capacidad efectiva ni de las prestaciones en una cache no volátil de último nivel (NV-LLC). Por ello, en este trabajo se propone un procedimiento de pronóstico que combina simulación y predicción con el objetivo de estudiar dicha evolución. Por otra parte, la compresión es una de las técnicas propuestas en la literatura para lidiar con la degradación de las memorias no volátiles. En primer lugar, la compresión diminuye la cantidad de información escrita en una NV-LLC. En segundo lugar, cuando los contenedores ven mermada su capacidad debido a la degradación, la compresión permite mantener su funcionalidad albergando bloques de tamaño reducido. El procedimiento de pronóstico desarrollado en este trabajo permite evaluar el impacto de diferentes técnicas y mecanismos de gestión de contenidos en la esperanza de vida y las prestaciones de una NV-LLC de manera detallada. El mecanismo de compresión adoptado en este trabajo multiplica hasta por 5 veces la esperanza de vida de una NV-LLC.Este trabajo ha sido financiado por MINECO/AEI/FEDER (UE) (proyectos PID2019-105660RB-C21 y PID2019-107255GB-C22 / AEI / 10.13039/501100011033), el Gobierno de Aragon (Grupo T58 20R) y FEDER 2014-2020 “Construyendo Europa desde Aragón”.Peer ReviewedPostprint (published version

    HyCSim: A rapid design space exploration tool for emerging hybrid last-level caches

    No full text
    Recent years have seen a rising trend in the exploration of non-volatile memory (NVM) technologies in the memory subsystem. Particularly in the cache hierarchy, hybrid last-level cache (LLC) solutions are proposed to meet the wide-ranging performance and energy requirements of modern days applications. These emerging hybrid solutions need simulation and detailed exploration to fully understand their capabilities before exploiting them. Existing simulation tools are either too slow or incapable of prototyping such systems and optimizing for NVM devices. To this end, we propose HyCSim, a trace-driven simulation infrastructure that enables rapid comparison of various hybrid LLC configurations for different optimization objectives. Notably, HyCSim makes it possible to quickly estimate the impact of various hybrid LLC insertion and replacement policies, disabling of a cache region at byte or cache frame granularity for different fault maps. In addition, HyCSim allows to evaluate the impact of various compression schemes on the overall performance (hit and miss rate) and the number of writes to the LLC. Our evaluation on ten multi-program workloads from the SPEC 2006 benchmarks suite shows that HyCSim accelerates the simulation time by 24×, compared to the cycle-accurate Gem5 simulator, with high-fidelity.This work was partially funded by the HiPEAC collaboration grant 2020, the German Research Council (DFG) through the TraceSymm project (366764507) and the Co4RTM project (450944241), MCIN/AEI/10.13039/501100011033 (grants PID2019-105660RB-C21 and PID2019- 107255GB-C22), and by Aragón Government (T5820R research group).Peer ReviewedPostprint (published version
    corecore